The goal of this project is to explore the trends and characteristics of Vancouver Street Trees over the years. Questions of interests will be analyzed and visualized over a series of data visualization graphs.
The questions of interest are as follows:
I am interested to see how the total number of trees planted in Vancouver has changed throughout the years to understand if there is an average or trend that more/less trees are planted in more recent or previous years. I would also like to analyze the trends of the most common planted trees throughout Vancouver and which species are the most common in different neighbourhoods. Furthermore, analyzing the distribution of diameter and height of trees could show trends in how different neighbourhoods may have different species of trees due to demographics and land disparities.
The dataset I will be focusing on in this project is a subset of the Vancouver Street Trees. It is obtained from the city of Vancouver's Open Data Portal and follows an Open Government Licence.
The dataset used in this analysis is:
First we will import in all the required libraries for our analysis.
#Import required libraries
import altair as alt
import numpy as np
import pandas as pd
import string
#alt.data_transformers.enable('default', max_rows=1000000)
import json
# Import required files
trees_df = pd.read_csv('small_unique_vancouver.csv', parse_dates=["date_planted"])
trees_df.head()
| Unnamed: 0 | std_street | on_street | species_name | neighbourhood_name | date_planted | diameter | street_side_name | genus_name | assigned | ... | plant_area | curb | tree_id | common_name | height_range_id | on_street_block | cultivar_name | root_barrier | latitude | longitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 10747 | W 20TH AV | W 20TH AV | PLATANOIDES | Riley Park | 2000-02-23 | 28.5 | EVEN | ACER | N | ... | 15 | Y | 21421 | NORWAY MAPLE | 4 | 0 | NaN | N | 49.252711 | -123.106323 |
| 1 | 12573 | W 18TH AV | W 18TH AV | CALLERYANA | Arbutus-Ridge | 1992-02-04 | 6.0 | ODD | PYRUS | N | ... | 7 | Y | 129645 | CHANTICLEER PEAR | 2 | 2300 | CHANTICLEER | N | 49.256350 | -123.158709 |
| 2 | 29676 | ROSS ST | ROSS ST | NIGRA | Sunset | NaT | 12.0 | ODD | PINUS | N | ... | 7 | Y | 154675 | AUSTRIAN PINE | 4 | 7800 | NaN | N | 49.213486 | -123.083254 |
| 3 | 8856 | DOMAN ST | DOMAN ST | AMERICANA | Killarney | 1999-11-12 | 11.0 | EVEN | FRAXINUS | N | ... | 7 | Y | 180803 | AUTUMN APPLAUSE ASH | 4 | 6900 | AUTUMN APPLAUSE | N | 49.220839 | -123.036721 |
| 4 | 21098 | EAST BOULEVARD | EAST BOULEVARD | HIPPOCASTANUM | Shaughnessy | NaT | 15.5 | ODD | AESCULUS | Y | ... | N | Y | 74364 | COMMON HORSECHESTNUT | 4 | 5200 | NaN | N | 49.238514 | -123.154958 |
5 rows × 21 columns
trees_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 5000 entries, 0 to 4999 Data columns (total 21 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Unnamed: 0 5000 non-null int64 1 std_street 5000 non-null object 2 on_street 5000 non-null object 3 species_name 5000 non-null object 4 neighbourhood_name 5000 non-null object 5 date_planted 2363 non-null datetime64[ns] 6 diameter 5000 non-null float64 7 street_side_name 5000 non-null object 8 genus_name 5000 non-null object 9 assigned 5000 non-null object 10 civic_number 5000 non-null int64 11 plant_area 4950 non-null object 12 curb 5000 non-null object 13 tree_id 5000 non-null int64 14 common_name 5000 non-null object 15 height_range_id 5000 non-null int64 16 on_street_block 5000 non-null int64 17 cultivar_name 2658 non-null object 18 root_barrier 5000 non-null object 19 latitude 5000 non-null float64 20 longitude 5000 non-null float64 dtypes: datetime64[ns](1), float64(3), int64(5), object(12) memory usage: 820.4+ KB
trees_df.describe()
| Unnamed: 0 | diameter | civic_number | tree_id | height_range_id | on_street_block | latitude | longitude | |
|---|---|---|---|---|---|---|---|---|
| count | 5000.000000 | 5000.000000 | 5000.000000 | 5000.000000 | 5000.00000 | 5000.000000 | 5000.000000 | 5000.000000 |
| mean | 14861.920400 | 12.340888 | 2975.707600 | 128682.584600 | 2.73440 | 2960.227000 | 49.247349 | -123.107128 |
| std | 8680.023278 | 9.266600 | 2078.580429 | 75412.260406 | 1.56957 | 2086.861052 | 0.021251 | 0.049137 |
| min | 2.000000 | 0.000000 | 2.000000 | 36.000000 | 0.00000 | 0.000000 | 49.202783 | -123.220560 |
| 25% | 7192.750000 | 4.000000 | 1300.500000 | 61321.500000 | 2.00000 | 1300.000000 | 49.230152 | -123.144178 |
| 50% | 14870.000000 | 10.000000 | 2639.000000 | 130130.500000 | 2.00000 | 2600.000000 | 49.247981 | -123.105861 |
| 75% | 22366.750000 | 18.000000 | 4123.000000 | 191332.000000 | 4.00000 | 4100.000000 | 49.263275 | -123.063484 |
| max | 29992.000000 | 71.000000 | 9113.000000 | 270750.000000 | 9.00000 | 9100.000000 | 49.293930 | -123.023311 |
The columns of interest are 'species_name', 'neighbourhood_name', 'date_planted', 'diameter', 'common_name', 'height_range_id', 'latitude', 'longitude'. These columns will be used in our analysis as the common name of tree species and neighbourhood will be important in our analysis.
The column "Unnamed: 0" will be dropped as it's purpose and relevance is uncertain.
The column 'date_planted' has 2637 null values that should raise concern and should be examined more closely. This means that graphs that use the column 'date_planted' may not be representative of the dataset as a whole as more than half of the values are missing for the year the trees are planted. The missing values could be a result of data collection as possibly there are missing records of when certain trees are planted or due to seed dispersal, it is difficult to note down how certain trees grow.
The column 'neighbourhood_name' will be named to 'name' for convenience as it will be commonly used throughout this report.
trees_df = trees_df.drop(columns='Unnamed: 0')
trees_df = trees_df.rename(columns={'neighbourhood_name':'name'})
trees_df
| std_street | on_street | species_name | name | date_planted | diameter | street_side_name | genus_name | assigned | civic_number | plant_area | curb | tree_id | common_name | height_range_id | on_street_block | cultivar_name | root_barrier | latitude | longitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | W 20TH AV | W 20TH AV | PLATANOIDES | Riley Park | 2000-02-23 | 28.5 | EVEN | ACER | N | 66 | 15 | Y | 21421 | NORWAY MAPLE | 4 | 0 | NaN | N | 49.252711 | -123.106323 |
| 1 | W 18TH AV | W 18TH AV | CALLERYANA | Arbutus-Ridge | 1992-02-04 | 6.0 | ODD | PYRUS | N | 2323 | 7 | Y | 129645 | CHANTICLEER PEAR | 2 | 2300 | CHANTICLEER | N | 49.256350 | -123.158709 |
| 2 | ROSS ST | ROSS ST | NIGRA | Sunset | NaT | 12.0 | ODD | PINUS | N | 7855 | 7 | Y | 154675 | AUSTRIAN PINE | 4 | 7800 | NaN | N | 49.213486 | -123.083254 |
| 3 | DOMAN ST | DOMAN ST | AMERICANA | Killarney | 1999-11-12 | 11.0 | EVEN | FRAXINUS | N | 6938 | 7 | Y | 180803 | AUTUMN APPLAUSE ASH | 4 | 6900 | AUTUMN APPLAUSE | N | 49.220839 | -123.036721 |
| 4 | EAST BOULEVARD | EAST BOULEVARD | HIPPOCASTANUM | Shaughnessy | NaT | 15.5 | ODD | AESCULUS | Y | 5295 | N | Y | 74364 | COMMON HORSECHESTNUT | 4 | 5200 | NaN | N | 49.238514 | -123.154958 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4995 | E 53RD AV | E 53RD AV | SERRULATA | Victoria-Fraserview | NaT | 17.0 | EVEN | PRUNUS | N | 2248 | 9 | Y | 47059 | KWANZAN FLOWERING CHERRY | 2 | 2200 | KWANZAN | N | 49.221161 | -123.061023 |
| 4996 | E 32ND AV | E 32ND AV | XX | Kensington-Cedar Cottage | 2014-01-14 | 3.0 | EVEN | CORNUS | N | 1708 | 10 | N | 247874 | EDDIES WHITE WONDER DOGWOOD | 1 | 1700 | EDDIE'S WHITE WONDER | N | 49.241544 | -123.070644 |
| 4997 | DAWSON ST | DAWSON ST | TULIPIFERA | Killarney | 2002-04-15 | 3.5 | EVEN | LIRIODENDRON | N | 6512 | 7 | Y | 192642 | ARNOLD TULIPTREE | 2 | 6500 | ARNOLD | N | 49.224511 | -123.048723 |
| 4998 | E 13TH AV | E 13TH AV | INVOLUCRATA | Mount Pleasant | 2003-12-02 | 5.5 | EVEN | DAVIDIA | N | 360 | 5 | Y | 202500 | DOVE OR HANDKERCHIEF TREE | 1 | 300 | NaN | Y | 49.259208 | -123.096905 |
| 4999 | CULLODEN ST | CULLODEN ST | CAMPESTRE | Kensington-Cedar Cottage | NaT | 3.0 | ODD | ACER | N | 4565 | 8 | Y | 259433 | RED SHINE MAPLE | 1 | 4500 | RED SHINE | N | 49.243772 | -123.078967 |
5000 rows × 20 columns
# A general overview of the distribution of quantitative columns
numeric_cols = trees_df.select_dtypes('number').columns.tolist()
numeric_hist_plots = (alt.Chart(trees_df).mark_bar().encode(
alt.X(alt.repeat(), type='quantitative', bin=alt.Bin(maxbins=40)),
y='count()')
.properties(width=250, height=150)
.repeat(numeric_cols, columns=2, title='Numeric Columns'))
numeric_hist_plots
This is a general overview of the numerical columns in this dataset. Some of the numerical columns that might be related to my questions of interest would be "diameter", "height_range_id", "latitude", and "longitude". The categorical columns of interest include "species_name", "neighbourhood_name", and "common_name".
trees_df = trees_df.assign(year = trees_df['date_planted'].dt.year)
# Drop all null values in year column
trees_df_filtered = trees_df.dropna(subset=['year'])
tree_year_plot = alt.Chart(trees_df_filtered).mark_bar(color='steelblue').encode(
alt.X('year:N', title='Year', scale=alt.Scale(zero=False)),
alt.Y('count()', title='Number of Trees', scale=alt.Scale(zero=False)),
tooltip=[alt.Tooltip('count()', title="Number of Trees")]).properties(title='Fig 1. Number of Trees Planted in Vancouver Throughout the Years')
tree_year_plot
Figure 1 shows us that the number of trees planted in Vancouver over the years from 1990 to 2016 has no particular trend. It seems that the most number of trees planted in Vancouver was the year 1996 with 133 trees being planted that year. It also seems that in recent years, after 2015, less trees are being planted on average. The least number of trees planted in Vancouver was the year 2016 with only 6 trees being planted. Furthermore, this graph makes me ask the following question: where are trees planted the most throughout the years in Vancouver? Which neighbourhood has planted the most amount of trees? Let's take a look.
neighbourhood_count_histogram_plot = alt.Chart(trees_df).mark_bar(color='steelblue').encode(
alt.X('name', title='Neighbourhood', sort='y'),
alt.Y('count()', title='Number of Trees'),
tooltip=[alt.Tooltip('count()', title="Number of Trees")]).properties(width=500, title='Fig 2. Number of Trees in each Neighbourhood in Vancouver')
neighbourhood_count_histogram_plot
Figure 2 allows us to easily see the neighbourhoods with the most amount and least amount of trees. Stratchona has the least count of trees with 75 trees and Renfrew-Collingwood has the most amount of trees with 384.
It may be useful to see understand the timeframe of when certain neighbours have trees planted and to see if there is a trend involved. Let's look even further!
interval = alt.selection_interval(encodings=['x'])
bar_slider = tree_year_plot.encode(
color=alt.condition(interval, alt.value('navy'), alt.value('lightgray'))).properties(
width=600,
height=100)
bar_slider = bar_slider.add_selection(
interval)
bar_plot = neighbourhood_count_histogram_plot.encode(
tooltip=[
alt.Tooltip("count()", title="Number of Trees"),
alt.Tooltip("name:N", title="Neighbourhood")]
).properties(width=600)
bar_plot = bar_plot.transform_filter(interval)
bar_plot = bar_plot.encode(
tooltip=[alt.Tooltip('count()', title="Number of Trees")])
combo_plot = (bar_slider & bar_plot).properties(title='Trees Are Planted In Renfrew-Collingwood the Most Throughout the Years')
combo_plot
This interactive graph allows us to use the Fig 1. as a bar interval slider to see the timeline and trends of which neighbourhood has trees planted. Using the bar slider with intervals of 5 years at a time, we can see that in the oldest years of the dataframe 1989-1993, the number of trees are planted the most in the neighbourhood Hastings-Sunrise. Moving along we see the neighbourhoods with the most amount of trees planted fluctuate between Renfrew-Collingwood, Kensington-Cedar Cottage, and Hastings-Sunrise. Most of the intervals indicate that Renfrew-Collingwood has the most number of trees of all neighbourhoods in Vancouver which isn't a surprise as in Fig 2, we saw that Renfrew-Collingwood has the most amount of trees overall across all of Vancouver. The most recent 5 year interval show Hastings-Sunrise as the neighbourhood with the most amount of planted trees.
top_common_names = trees_df['common_name'].value_counts().index.tolist()
selected_common_names = top_common_names[:30]
filtered_df = trees_df[trees_df['common_name'].isin(selected_common_names)]
neighbourhoods = sorted(filtered_df['name'].unique())
dropdown_neighbour = alt.binding_select(name='Neighbourhood ', options=neighbourhoods)
select_neighbour = alt.selection_single(fields=['name'], bind={'name': dropdown_neighbour})
trees_neighbourhood_plot = alt.Chart(filtered_df).mark_bar().add_selection(select_neighbour).encode(
alt.X('common_name:O', title='Tree Common Name', sort='-y'),
alt.Y('count()', title='Number of Trees'),
#opacity=alt.condition(select_neighbour, alt.value(0.8), alt.value(0.08)),
color=alt.value('steelblue'),
tooltip=[alt.Tooltip("count()", title="Number of Trees")]).transform_filter(select_neighbour).properties(title='Fig. 3 Most Common Trees in Neighbourhoods in Vancouver')
trees_neighbourhood_plot
The graph above allows us to easily see the species that are the most common in each neighbourhood. The top 30 most common tree types in each neighbourhood are shown in this graph as the graph would be too long. The inital graph is an overall view of the most common trees in all neighbourhoods. We see that the most common tree all throughout Vancouver is Kwazan Flowering Cherry with 383 of this species of trees being planted.
Using the dropdown selection we can filter by specific neighbourhoods and we see the top 30 most common trees found in each neighbourhood.
select_neighbour = alt.selection_multi(fields=['name'], bind='legend')
trees_hd_plot = alt.Chart(trees_df).mark_circle(size=20).encode(
alt.X('height_range_id:Q', scale=alt.Scale(zero=False), title='Height (Inches)'),
alt.Y('diameter:Q', scale=alt.Scale(zero=False), title='Diameter (Inches)'),
color=alt.Color('name:N', title='Neighbourhood'),
tooltip=['common_name:N', 'name:N'],
opacity=alt.condition(select_neighbour, alt.value(0.8), alt.value(0))).add_selection(select_neighbour).properties(title='Fig 4. Trees Height and Diameter by Neighbourhood')
trees_hd_plot
In Fig 4. we can see the distribution of height range and diameter of trees in each neighbourhood. This is useful as we can gather insight such as which neighbourhoods have the tallest/widest trees. In the City of Vancouver website, the diameter is defined as the diameter of tree at breast height.
The legend is clickable which allows us to view each distribution for each neighbourhood. It seems that Downtown has trees with the smallest diameter and height whereas Dunbar seems to have the most trees with the biggest diameter and height.
I was also able to see that it seems neighbourhoods from West Vancouver such as Dunar, Kitsilano, Shaughnessy, and Kerrisdale have more points located on the right of the graph compared to neighbourhoods in East Vancouver. This is a very interesting finding as it follows a similar trend of the article: which states that trees in higher income neighbourhoods are taller and have larger canopies.
In Fig 1, we see that the number of trees planted in recent years are relatively lower than compared to previous years. As well, the oldest years in the dataset from 1989 to 1991 also have a very low amount of trees planted. There seems to be no clear trend for the amount of trees planted in Vancouver throughout the years. However, we do see an influx of higher amount of trees planted throughout the years 1996 to 2002. This could be due to many different reasons such as city planning initiatives or urban development projects. One thing to note is that when we first looked at the overview of the dataset columns, the "date_planted" column is half filled with null values which were dropped to create this graph. The number of null values could be due to dataset collection methods or due to how many trees grow through seed dispersal which would be difficult to indicate the exact time the tree was planted. This may mean this graph is not an accurate depiction of the actual number of trees planted in Vancouver throughout the years.
We also saw the distribution of trees planted in each neighbourhood in Vancouver in Fig 2. Renfrew-Collingwood has the most amount of trees while Strathcona has the least amount of trees according to the dataset. There are many possible reasons as to why Renfrew has high tree planting activity in Vancouver, perhaps due to community initiatives, urban development projects, or environmental sustainability policies. Strathcona is also one of the oldest neighbourhoods in Vancouver which may be a factor as to why the least amount of trees are planted here due to the demographic or land use.
We were then able to investigate the trend between the number of trees planted in each neighbourhood throughout the years through an interactive graph. In intervals of 5 years at a time, we can see that the neighbourhood with the most trees planted recently is Hastings-Sunrise. Throughout the years, we see Renfrew Collingwood, Kensington, and Hastings have the most number of trees planted. This is interesting because on the Vancouver map, these neighbourhoods share borders beside each other, perhaps, indicating trees are mostly planted in general on the East side of Vancouver compared to the West. An article, shares the finding that "researchers found that more affluent neighbourhoods like Shaughnessy, Dunbar and West Point Grey had higher Local Restorative Nature scores". LRN scores in the article look at "three areas that promote mental well-being: refuge, wild nature and diversity". The index found that "more vulnerable areas, such as downtown and Strathcona, scored lower in LRN." We can see this clearly in Fig 2 where more vulnerable areas such as Strathcona have the least amount of trees planted.
Furthermore, we saw in Fig 3. the most common species of tree in Vancouver overall is the Kwazan Flowering Cherry with 383 of the species being planted. According to this article, flowering cherry trees were given as a gift from Japan, to be placed in Stanley Park in honour of Japanese Canadians who served in World War I. There are now more than 43,000 cherry trees in Vancouver, perhaps explaining the abundance of cherry blossom festivities around Vancouver in spring time.
Through the last figure, Fig 4., we saw that the distribution of height and diameter of different trees in each neighbourhood differ depending on the location. It seems there is a possible trend that trees on the West side of Vancouver are taller and have thicker diameters compared to trees on the East side. Another article shares the trend from Montreal, where "the wealthier your neighbourhood, the more likely you will be surrounded by trees". Researchers found that "more privileged neighbourhoods tend to have not only higher tree cover, but also a greater diversity of species". I wonder if this is the case in Vancouver, as it seems more affluent neighbourhoods such as those on the West side of Vancouver have taller and more thicker tree species.
Futher analysis and exploration should be conducted as other questions arise from the information we gathered through this analysis. For example, another question of interest could be to explore the relationship between the most common species planted throughout the years. It may also be useful to explore the dataset more using a map of Vancouver to visualize the disparities between different areas of Vancouver. Each neighbourhood can also be thoroughly explored as there are many columns of interest regarding this, for example, std_street, on_street, street_side_name, curb. Using geographical visualizations, we may be able to discover more trends regarding different tree species and their relationships to height/diameter depending on different location of neighbourhoods throughout Vancouver.
dashboard = (
combo_plot|
(trees_hd_plot.properties(width=500) &
trees_neighbourhood_plot.properties(width=500))
)
dashboard
Reference to code made throughout this project and is obtained from canvas through the Data Visualization course
Data Source: a subset obtained from the City of Vancouver website
Articles used
https://www.cbc.ca/news/canada/montreal/montreal-trees-inequality-canada-1.6175204
Wikipedia article on the list of neighbourhoods in Vancouver: https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Vancouver
Image of Map of Vancouver: https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Vancouver#/media/File:Vancouver_neighbourhoods.jpg